Morphological decomposition in Arabic ASR systems

نویسندگان

  • Frank Diehl
  • Mark J. F. Gales
  • Marcus Tomalin
  • Philip C. Woodland
چکیده

In recent years, the use of morphological decomposition strategies for Arabic Automatic Speech Recognition (ASR) has become increasingly popular. Systems trained on morphologically decomposed data are often used in combination with standard word-based approaches, and they have been found to yield consistent performance improvements. The present article contributes to this ongoing research endeavour by exploring the use of the ‘Morphological Analysis and Disambiguation for Arabic’ (MADA) tools for this purpose. System integration issues concerning language modelling and dictionary construction, as well as the estimation of pronunciation probabilities, are discussed. In particular, a novel solution for morpheme-to-word conversion is presented which makes use of an N-gram Statistical Machine Translation (SMT) approach. System performance is investigated within a multi-pass adaptation/combination framework. All the systems described in this paper are evaluated on an Arabic large vocabulary speech recognition ∗F. Diehl Email address: {fd257, mjfg, mt126, pcw}@eng.cam.ac.uk (F. Diehl, M.J.F. Gales, M. Tomalin, and P.C. Woodland) Preprint submitted to Elsevier August 25, 2011 task which includes both Broadcast News and Broadcast Conversation test data. It is shown that the use of MADA-based systems, in combination with word-based systems, can reduce the Word Error Rates by up to 8.1% relative.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling OOV words in Arabic ASR via flexible morphological constraints

We propose a novel framework to detect and recognize outof-vocabulary (OOV) words in automated speech recognition (ASR). In the proposed framework a hybrid language model combining words and sub-word units is incorporated during ASR decoding then three different OOV words recognition methods are applied to generate OOV word hypotheses. Specifically, dictionary lookup, morphological composition,...

متن کامل

On the use of morphological analysis for dialectal Arabic speech recognition

Arabic has a large number of affixes that can modify a stem to form words. In automatic speech recognition (ASR) this leads to a high out-of-vocabulary (OOV) rate for typical lexicon size, and hence a potential increase in WER. This is even more pronounced for dialects of Arabic where additional affixes are often introduced and the available data is typically sparse. To address this problem we ...

متن کامل

Morphological analysis and decomposition for Arabic speech-to-text systems

Language modelling for a morphologically complex language such as Arabic is a challenging task. Its agglutinative structure results in data sparsity problems and high out-of-vocabulary rates. In this work these problems are tackled by applying the MADA tools to the Arabic text. In addition to morphological decomposition, MADA performs context-dependent stem-normalisation. Thus, if word-level sy...

متن کامل

Morphological Decomposition for Asr in German

In this contribution we report on our ongoing work in lexical decomposition for automatic speech recognition (ASR). Lexical decomposition is investigated with a twofold goal: lexical coverage optimization and improved automatic letter-tosound conversion. Whereas morphological decomposition is a widely-studied domain in linguistics, our interest is limited here to identifying and processing the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2012